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Abstract 

We give a mathematical framework for Exact Milestoning, a recently 
introduced algorithm for mapping a continuous time stochastic process 
into a Markov chain or semi-Markov process that can be efficiently sim¬ 
ulated and analyzed. We generalize the setting of Exact Milestoning and 
give explicit error bounds for the error in the Milestoning equation for 
mean first passage times. 


1 Introduction 

Molecular Dynamics (MD) simulations, in which classical equations of motions 
are solved for molecular systems of significant complexity, have proven useful for 
interpreting and understanding many chemical and biological phenomena (for 
textbooks see [471 EH E]). However, a significant limitation of MD is of time 
scales. Many molecular processes of interest occur on time scales significantly 
longer than the temporal scales accessible to straightforward simulations. For 
example, permeation of molecules through membranes can take hours |5] while 
MD is usually restricted to the microseconds time scale. One approach to ex¬ 
tend simulation times is to use faster hardware |48l |49l |45] . Other approaches 
focus on developing theories and algorithms for long time phenomena. Most 
of the emphasis has been on methodologies for activated processes with a sin¬ 
gle dominant barrier, as in Transition Path Sampling [Miziiin]- Approaches 
for dynamics on rough energy landscapes, and for more general and/or diffu¬ 
sive dynamics, have also been developed gminiiiiEniiis]- The techniques of 
Exact Milestoning [5] and Milestoning [22] belong to the last category. They 
are theories and algorithms to accelerate trajectory calculations of kinetics and 
thermodynamics in complex molecular systems. The acceleration is based on 
the use of a large number of short trajectories instead of complete trajectories 
between reactants and products (Figure [^. The simulation of short trajectories 
is trivial to implement in parallel, making the formulation efficient to use on 
modern computing resources. Moreover, the use of short trajectories makes it 

* Department of Mathematics, Colorado State University, Fort Collins, CO 
Institute for Computational Engineering and Sciences, University of Texas at Austin, 
Austin, TX 

^Institute for Computational Engineering and Sciences, Department of Chemistry, Univer¬ 
sity of Texas at Austin, Austin, TX 


1 



possible to enhance sampling of improbable but important events by initiating 
the short trajectories near bottlenecks of reactions. A challenge is how to start 
the short trajectories, and how to analyze the result to obtain correct long time 
behavior. 

While Milestoning is an approximate procedure, it shares the same philoso¬ 
phy and core algorithm as the Exact Milestoning approach. In both algorithms 
the phase space is partitioned by hypersurfaces, which we call milestones 
Men, into cells. The short trajectories are initiated on milestones and are 
terminated the first time they reach a neighboring milestone (Figure [^. The 
short trajectories can be simulated in parallel. 

Milestoning uses an approximate distribution for the initial conditions of 
the trajectories at the hypersurfaces. The results are then analyzed within the 
Milestoning theory. The approximation is typically the (normalized) canoni¬ 
cal distribution restricted to the milestone interface M. In Exact Milestoning 
the distribution of hitting points at the interface is estimated numerically by 
iteratively computing trajectory fragments between milestones. In a straight¬ 
forward implementation of the iterations (see also 0) the final phase points of 
trajectories that were terminated on one milestone are continued until they hit 
another milestone. This type of trajectory continuation procedure is also used 
in Non-Equilibrium Umbrella Sampling |55j and Trajectory Tilting [53j . The 
continuation does not mean that full trajectories from reactants to products are 
computed. The calculations stop when the stationary distribution at the inter¬ 
face, or observables of interest, converge. In practice, and depending of course 
on the initial guess, the calculation ends significantly earlier than complete tra¬ 
jectories from Rto P are computed. The fast convergence of the iterations leads 
to significant computational savings. 

A number of other algorithms build on the use of short trajectories to esti¬ 
mate long time kinetics by “patching” these short trajectories at milestones or 
interfaces. These technologies include the Weighted Ensemble (WE) [57l|32], 
Transition Interface Sampling (TIS) [52], Partial Path Transition Interface Sam¬ 
pling (PPTIS) [33], Forward Flux Sampling (FES) [2|, Non-Equilibrium Um¬ 
brella Sampling (NEUS) [SS], Trajectory Tilting |S3|, and Boxed Molecular Dy¬ 
namics (BMD) |25j . Some of these techniques are similar; however, many 
subtle differences remain. Some of the differences are as follows. WE is the 
only method that makes it necessary to use stochastic dynamics. The trajec¬ 
tory sampling in NEUS, Trajectory Tilting and Exact Milestoning is similar, 
even though the theories are quite different. Exact Milestoning allows for the 
calculations of all the moments of the first passage time |S], a result which is 
not available for other technologies. Boxed Molecular Dynamics, Milestoning 
and PPTIS are approximate methods leading to greater efficiency. TIS, PPTIS, 
FES, and BMD are focused on one-dimensional reaction coordinates. Other 
technologies (e.g. WE, Milestoning, NEUS, and Trajectory Tilting) focus on a 
space of one or several coarse variables. 

Hence, the overall scopes of these techniques differ significantly, which make 
direct comparison between them less obvious. We have compared in the past the 
accuracy and efficiency of the methods of Milestoning and Exact Milestoning 
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with Forward Flux [siiini. Forward Flux is one of the closest algorithms (in 
one dimension) to Milestoning and Exact Milestoning. Numerous examples of 
kinetics of molecular systems studied with Milestoning were published [ami 
[5S1 IMl [551 [5n [5DI1551115] . We have also discussed extensively the features of 
alternate technologies that exploit trajectory fragments |551 HD] . 

The Milestoning theory has not yet been subject to rigorous mathematical 
analysis, which is the goal of the present manuscript. In this manuscript we 
show that the Exact Milestoning method can be derived and analyzed in the 
framework of probability theory. The result is a useful link between physical in¬ 
tuition and a more formal approach. Readers that are interested in the efficiency 
of the algorithm on concrete examples, and comparison to other technologies, 
are referred to the sources mentioned the above paragraph. 



Figure 1: Representation of the state space fl and the milestones. Each mile¬ 
stone is one of the line segments traced by dashed grey lines. The reactant state 
R is highlighted as a square dot in the left-bottom corner while the product 
state P is comprised of the two line segments shown in blue at the upper-right 
corner. A particular realization of a long trajectory appears as a continuous 
black line and the corresponding values of (Jn) are marked with dots. 

This article is organized as follows. In Section we describe the setting 
for Exact Milestoning and introduce notation used throughout. In Section]^ 
we show existence of and convergence to a stationary flux under very general 
conditions. In Section]^ we state precisely the Exact Milestoning algorithm [^. 
In Section]^ we establish conditions under which convergence to the stationary 
flux is consistent in the presence of numerical error (Lemmaand Theorem]^, 
and we give a natural upper bound for the numerical error arising in Exact 
Milestoning (Theorem]^. Finally, in Section we consider some instructive 
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examples. 


2 Setup and notation 

2.1 The dynamics and MFPT 

In Milestoning we spatially coarse-grain a dynamics {Xt). The basic idea is 
to stop and start trajectories on certain interfaces, called milestones, and then 
reconstruct functions of (Xt) using these short trajectories, which can be effi¬ 
ciently simulated in parallel. We assume here the dynamics is stochastic, and 
focus on using Milestoning for the efficient computation of mean first passage 
times (MFPTs) of (Xt), although similar ideas can be used to compute other 
non-equilibrium quantities. 

To make our arguments we need some assumptions on (Xt). We let (Xt) be 
a time homogeneous strong Markov process with cadlag paths taking values in 
a standard Borel space fl. These assumptions allow us to stop and restart (Xt) 
on the milestones without knowing its history. In applications, usually (W) is 
Langevin or overdamped Langevin dynamics, and 17 is a subset of Euclidean 
space. 

We write P, E for all probability measures and expectations, with super¬ 
scripts P"® (resp. P^) to indicate a starting point x (resp. distribution ^). The 
symbol ^ will indicate equality in probability law. We will use the words dis¬ 
tribution and probability measure interchangeably. Total variation norm will be 
denoted by || • \\tv- Our analysis below will mostly take place in an idealized 
setting where we assume inhnite sampling on the milestones. In this setting, 
distributions are smooth (if state space is continuous) and the total variation 
norm is the appropriate one. 

Recall we are interested in computing the MFPT of (Xt) from a reactant 
set i? to a product set P. Throughout we consider fixed disjoint product and 
reactant sets P,RC 17. When R is not a single point, we will start (Xt) 
from a fixed probability measure p on R. If i? is a single point, p = Sr, the 
delta distribution at R. As discussed above, Milestoning allows for an efficient 
computation of the MFPT of {Xt) to P, starting at p. It is useful to think of 
P as a sink, and P as a source for {Xt). More precisely, we assume that when 
{Xt) reaches P, it immediately restarts on R according to p. Obviously, this 
assumption has no effect on the MFPT to P. It will be useful, however, for 
computational and theoretical considerations. 

Many of the results below follow from well-known theorems in probability 
theory. However, because of the special source-sink structure of {Xt), simpler 
proofs are often available, and we include them for clarity and completeness. 

2.2 The milestones and semi-Markov viewpoint 

We write M C 17 for the space of milestones used for parallelizing the com¬ 
putation of the MFPT. Each point x € M belongs to a milestone C M. 
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Thus, M is the union of all the milestones. We assume there are finitely many 
milestones, each of which is a closed set. Moreover, we demand that {Xt) passes 
through the intersection of two milestones with probability 0 ~ thus, {Xt) can 
only cross one milestone at a time. This can be accomplished for Langevin or 
overdamped Langevin dynamics by taking the milestones to be codimension 1 
with pairwise intersections of codimension 2 or larger; see Figure The sets P 
and R will be two of the milestones. We always start {Xt) on M. 

By following the sequence of milestones crossed by {Xt), we obtain a sequence 
of points {Jn) in M. See Figurej^ We now describe (J„) more precisely. Let 
be the nth milestone crossing time for {Xt), defined recursively by 0o = 0 and 

if Xg^ = X, then 6n+i := inf{t > '■ Xt G My for some My ^ M^}- 

Note that by a milestone crossing, we mean a crossing of a milestone different 
from the previous one. The sequence of milestone crossing points is Jn = Xg^. 

We show now that {Xt) can be partially reconstructed from (J„) and {9n)- 
Let {Yt) be definecQby setting Yt = Jn whenever On < t < On+i- Then {Xt) 
and (Yt) agree at each milestone crossing time t = On, n = 0,1,2,... and (Yt) 
is obtained from {Xt) by throwing away the path of {Xt) between milestone 
crossings, keeping only the endpoints. It follows that {Xt) and (Yt) have the 
same MFPT to P. Thus, for our purposes it is enough to study (Yt). We note 
that (Yt), like {Xt), immediately restarts at p upon reaching P. 

By our assumptions above, {Jn) is a Markov chain on M, and (Yt) is a 
semi-Markov process on M, meaning it has the Markov property at crossing 
times. We write K{x,dy) for the transition kernel of (J„). Thus, if the initial 
distribution of {Jn) is Jq ^ (, then the distribution at time n is P^(Jn € ■) = 
We also write := E^[/(J„)] and (f := f{x) ({dx) for suitable 

functions /. 

The following notation will be needed. For x G M, define local first passage 
times 

= inf{t > 0 : Yt G My for some My ^ M^}. 

Thus, tIj is the first time for (Yt) to cross some milestone other than M^, 
starting at Tq = a:- In particular, if Xg^_.^ = x, then ~ 0„_i -|- r^. We also 
define rp to be the first time to cross P and ap the number of crossings before 
reaching P, 

Tp = inf{t > 0 : Yt G P}, crp = min{n > 0 : J„ S P}. 

We are interested in E'’[rp], the MFPT from p to P. 

^When (Yt) has a probability density, it corresponds to the density p{x, t) from [5] for the 
last milestone point passed. 
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3 Invariant measure and MFPT 


3.1 Stationary distribution on the milestones 


The MFPT will be estimated via short trajectories between milestones. An 
important ingredient is the correct starting distribution for these trajectories. 
Exact Milestoning makes use of a stationary flux of (Xt) on the milestones, 
which correspond^ to the stationary distribution /i of (J„). It is worth noting 
that Milestoning can also be made exact by choosing milestones as isocommittor 
surfaces [SI] . The advantage of the formulation here is that the milestones can 
be arbitrary. 

Some assumption is required to guarantee the existence of a stationary flux. 
We adopt the following sufficient condition, which we assume holds throughout: 

E^jrp] and E^[crp] are finite for all probability measures ^ on M. 


This ensures that (Yt) reaches P in finite expected time and does not have 
infinitely many milestone crossings in finite time. The condition can be readily 
verified in the standard settings for milestoning discussed above. Using this 
assumption and the source-sink structure of the dynamics - namely, that (Yt) 
immediately restarts at p upon reaching P - we show in Theorem below that 
p, exists. 

Theorem 1. (J„) has an invariant probability measure p defined by 


p{-) :=E^ 




_n=0 


E^ap + l]-!. 


where = 1 if Jn & C and otherwise = 0. 

Proof. Dehne i'{-) = E^ •}] observe that 


K-) = EE e • I CTp = n)P^(crp = n) = E ^ o-p > n). 

n—0 m—0 n—0 

If C n i? = 0, by bounded convergence, 

p oo oo 

/ v{dx)K{x,C) = E & C, ap > n) = G C, ap > n) = v{C), 

J M -n -n 


n—0 


n—0 


where the second equality uses P^(Jo € C) = 0 and Jn+i ^ R ^ ap ^ n. If 
C C R, 

/ OO OO 

p{dx)K{x,C) = E G U, cTp = n) -I- E GC, ap>n + l) 

n=0 n=0 

OO 

= p(C) - PP(Jo e C, ap > 0) + E €C,ap>n) = iy(C). 


n—0 


^Our is the same as the appropriately normalized stationary flux q in other Milestoning 
papers. We use /i instead of q to emphasize that here it is a probability measure, not a density. 
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□ 


We will show below that (J„) converges to /r under appropriate conditions. 
In that case /r is unique and we will call /r the stationary distribution of (Jn). 
A successful application of Exact Milestoning will require some technique for 
sampling /r. The algorithm we present (Algorithm 1 below) is based on conver¬ 
gence of the distribution of (J„) to /i in total variation. We demonstrate this 
convergence in Theorem under an additional assumption on (J„). 

It is worth noting that the proof of Theorem leads to the following repre¬ 
sentation of /i as a Neumann series. The representation is given in Corollary 
below. This representation can be used, in principle, to sample fi without ad¬ 
ditional assumptions on (J„). The Neumann series is written in terms of the 
transient kernel 


( 1 ) 


K{x,dy) 


K{x, dy), X ^ P 
0, X £ P 


K{x,dy) corresponds to a modified version of (J„) that is absorbed (killed) on 

P. 


Corollary 2. We have 

n—1 

(2) lim ||:/(M)-^ Vpi?* - ^IItf = 0. 

n—)-oo 

2=0 

Proof. Recall that y = v/v{M) where 

OO OO 

z/(.) = ^ P^( J„ G •, ap > n) = ^ pK^. 

Moreover, 


n=0 


n=0 


(3) 


sup 

I/I<1 


n—1 


p(M)-i ^ pK^f - pf 


2=0 


<p(A/)-i^ P^((Tp > i), 


and the right hand side of ([^ is summable since by assumption E^[crp] < oo. □ 


3.2 Milestoning eqnation for the MFPT 

Equipped with an invariant measure p, we are now able to state the Milestoning 
equation Q for the MFPT. In Exact Milestoning, this equation is used to 
efficiently compute the MFPT. The algorithm is based on two principles: first, 
many trajectories can be simulated in parallel to estimate for various a;; 
and second, the stationary distribution p can be efficiently estimated through 
a technique based on power iteration. See the right hand side of equation Q 
below. 
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The gain in efficiency comes from the fact that the trajectories used to 
estimate are much shorter than trajectories from R to P. Whether we can 
efficiently sample /i may depend somewhat on whether we have a good initial 
guess. When {Xt) is Langevin dynamics, we have found in some cases the 
canonical Gibbs distribution is a sufficiently good guess. See [5] and [B] for 
details and discussion. 

Theorem 3. Let fi he defined as above. Then > 0 and 

( 4 ) fi{P)EP[Tp]= [ /r(dx)E"[r^] :=E^[tm]. 

Jm 

Proof. The assumption E^[crp] < oo shows that /i(f’) > 0. For any x G M, 

E"[tp]= [ E^[Tp\Yr^^=y]K{x,dy) 

Jm 

= / E“ [rf, I = y] K{x, dy) + / E“ [rp - rf, \ = y] K{x, dy) 

JM J M\P 

= E^[tIj]+ ( Ey[Tp]K{x,dy). 

Jm\p 


Thus, 

E^[rp] = E^[rM]+/ f fj,{dx)Ey[Tp]K{x,dy) =Ey-[TM]+f n{dy)Ey[Tp], 

JM\P JM JM\P 

and so 

E^[rM] = j fJ.{dy)Ey[Tp] = /r(P)E^[Tp]. 

□ 

In Section below we present the Exact Milestoning algorithm (Algorithm 
1) recently used in [5] and [6]. The algorithm uses a technique which combines 
coarse-graining and power iteration to sample fi. Consistency of power iteration 
algorithms are justified via Theorem below, where we show y as 

oo. Though we emphasize that there are a range of possibilities for sampling y 
(for example, algorithms based on ([^ or Q below) we note that Algorithm 
1 was shown to be efficient for computing the MFPT in the entropic barrier 
example of [5] and the random energy landscapes example of [6]. 

3.3 Convergence to stationarity 

In this section we justify the consistency of power iteration-based methods for 
sampling y by showing that converges to y in total variation norm as 

n —>■ oo. The theorem requires an extra assumption - aperiodicity of the jump 
chain (J„). 


Theorem 4. Suppose that [Jn) is aperiodic in the following sense: 

(5) g.c.d. {n > 1 : P^(tTp = n — 1) > 0} = 1. 

Then for all probability measures ^ on M, 

(6) lim ||P^(J„ e •) - mIItv = lim -/iHry = 0. 

n—>-oo n—>-oo 

In particular, yt is unigue. 

Proof. We use a simple coupling argument. Let (Hn) be an independent copy 
of (Jn) and let Jq ^ ^ and Hq ^ yi. For n > 0, let S'„ (resp. T„) be the times at 
which {Jn) (resp. (Hn)) hit P for the {n + l)st time. Then iSn+i — Sn, u > 0, 
are iid random variables with finite expected value and nonlattice distribution, 
and (5'n+i - Sn)n>o ~ {Tn+i - Tn)n>0- It follows that {Sn - Tn)n >0 is a mean 
zero random walk with nonlattice step distribution. Thus, its first time to hit 
0 is finite almost surely. So 

C := infjn >0 : S P, iL„ G P} 

obeys P(C > n) —)• 0 as n —>■ oo. Note that ~ whenever f < n. Thus 
|P«(J„ G C) - ¥>^{Hn G C)| < 2P(C > n). 

Since yi is stationary for {Hn) we have ¥^^{Hn G C) = /i(C'). Now 

||P«(J„ G •) - mIItv = sup |P«(J„ G C) - yi{C)\ < 2P(C > n), 

CCM 

which establishes the convergence result. To see uniqueness, suppose ^ is an¬ 
other invariant probability measure for (Jn); then the last display becomes 
11^ — yJ'WTV < 2P((( > n). Letting n —)• oo shows that ^ ~ /r. □ 

We now consider a class of problems where there is a smooth one-dimensional 
reaction coordinate ^ [Oj 1] tracking progress of (W) from R to P. In 

this case V'lfi = 0, tp\p = 1, the milestones Mi,..., Mm are disjoint level sets 
of Ip, and R = Mi, P = Mm- The jump chain (J„) can only hop between 
neighboring milestones, unless it is at P. That is, if J„ G Mi for i ^ {l,m}, 
then Jn+i G Mi-i or Jn G M^+i; if Jn G Mi then J„+i G M 2 ; and if Jn G M^, 
then J„+i G Ml. Suppose that if Jn G Mi for i ^ {1,to}, then J„+i G Mi_i 
with probability in (0,1). Then the aperiodicity assumption ([^ is satisfied if 
and only if m is odd. This is due to the fact that, if Jq G Mi, then Jm-i € Mm 
and Jm+i G Mm with positive probability, and m and m -I- 2 are coprime when 
m is odd. On the other hand, if m is even then the conclusion of Theorem 
cannot hold. To see this, let m be even and suppose Jq is supported in an odd- 
indexed milestone. Then J 2 „ is always supported on an odd-indexed milestone, 
while J 2 n+i is always supported on an even-indexed milestone. 

Theorem estabishes convergence the distribution of J„ to yt in total vari¬ 
ation norm. Even when (J„) is not aperiodic, it converges in a time-averaged 
sense. Thus, problems in sampling yi arising from aperiodicity can be managed 
by averaging over time. More precisely, we have the following version of the 
Birkhoff ergodic theorem: 
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Theorem 5. Let Jq ^ with ^ a probability measure on M. For bounded 
measurable / : M —>■ K, 


1 ”_} r 

(7) lim - V/(Ji) “=■ / fdp = nf. 

Proof. Let S'„ be the times at which (J„) hits P for the (n + l)st time, and 
define 

S„+i 

/n= E 

i=S„ + l 

Note that /„, n > 0, are iid. Let k{n) = maxjfc : Sk < n} and write 


1 

n 


E 

2=0 


So . /i-(n)-l n 

-E/(-^*) + - E /^ + - E /(•^*) 

r? ■'i—^ 71 ^ 71 ^ 


2 = 0 


2 = 0 


^ —5'fc(Tl)+l 


Since (Jn) hits P in finite time a.s., n — Sk(n) Sq are finite a.s. Thus, 


- n—1 

lim - E fi^i) = 1™ 

r>—^rvi 71 f ^ n.— 


fc(n) — 1 1 


k{n) — l 


H f‘- 


n-foo n kin) — 1 

z=0 ' ' i=0 

Notice Rn '■= iS'n+i — S'„, n > 0 are iid with finite expectation and 

i?o + ■ • ■ + Rk(n)-i ^ n — So ^ i?o + • • • + Rk{n) 
kin) ~ kin) ~ kin) 

By the previous two displays and the law of large numbers, 


r, E fiM = + ii-‘ I:e'i/(j.)1 


i=0 


E[i?, 


i=0 


jfL 

i/(M) 




with v defined as in Theorem a □ 

Markov chains for which the conclusion of Theorem a hold are called Harris 
ergodic. It is worth noting that a slightly stronger aperiodicity condition leads 
to a limit for the distribution of Yt. More precisely, suppose ([^ holds and for 
each X G M \ P and y G M, P“^(ri G ■\Ji = y) \s nonlattice. Then for any 
C C M and y-a.e. x. 


( 8 ) 


lim ¥^iYt G C) 

t—¥OQ 


JclJ-idy)¥y[Tlj] 

]Ml^{dy)¥y[Tli]' 


See a detail^ and proof. 

^When the right hand side of a has a density, it is the same as the stationary probability 
density p{x) in [5] for the last milestone point passed. 
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4 Exact Milestoning algorithm 

We now describe in detail an algorithm for sampling /r and the MFPT E^[rp], 
used successfully in [5] and [6]. We assume throughout this section that the 
conclusion of Theorem holds. Let ^ be an initial guess for (If {Xt) is 
Brownian or Langevin dynamics, we usually take ^ to be the canonical Gibbs 
distribution.) We write Mi for the distinct milestones, so that M = UiMi. The 
algorithm will produce approximations 

of Let be the non-normalized restriction of to Mi, and define 

J Mi 


For C C Mj we will also use the notation 


a^\c) = [ icf^-^\dx)K{x,C). 

J Mi 

Below we think of a-”^ and as either distributions or densities. The a-"^ 
are obtained from trajectory fragments between milestone crossings. A sim¬ 
ple Monte Carlo scheme for estimating these distributions is as follows. Let 
Xi,... ,xl be iid samples from the distribution Starting at 

each xe G Mi, simulate (At) until it crosses the next milestone, say at the point 
yi G Mj. If we idealize by assuming the simulation of (At) is done exactly, then 
by Chebyshev’s inequality. 


aif{C)- j ^^5y,{dy) 
Jc ^ 


“12 


> e < 


\C)-a%\Cf 

Le2 


where dy is the Dirac delta distribution at y. We therefore write, for y G M, 


(9) 


Ml 

^ij 


ah- (j/) 


e=i 


where dy^ is either some suitable approximation to the identity at yi, or simply 
a delta function at yi. Thus, in Algorithm 1 we think of and as 

either densities in the former case, or as distributions in the latter. The local 
mean first passage times (i.e., the times between successive milestone crossings) 
are approximated by the sample means 




1 

L 


L 
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Algorithm 1 Exact Milestoning algorithm. 

Input: Milestones M = UjLiMj, initial guess and tolerance £ > 0 for the 
absolute error in the MFPT. 

Output: Estimates for /i, local MFPTs and overall MFPT E^[tp]. 

^ +00 

for all n = 1, 2,... do 
for i = 1 to TO do 

Estimate and E ^^^ ' [tm] 

* 1 ? ^ 

end for 

Solve w^A = w’’ (with A = (a|”^) € and w = (u>i,..., Wm) G K>q) 


for j = 1 to TO do 


G- 

end for 


E m 

i=l ' 




Normalize 

T(") ^/i(P)-lE^‘"-^'[TM] 
if |t(") < £ then 


break 
end if 
end for 

return [tm],7’*'"^) 


It is important to realize that we do not need to store the full coordinates of 
each yi in memory. Instead, it suffices to use a data-structure that keeps track 
of the pairs {y£,Mj). The actual coordinates of each point can be written to 
disk and read from it as needed. 

The eigenvalue problem in Algorithm involves a stochastic matrix A S 
I^mxm jg sparse. Indeed, the i-th row corresponds to milestone Mi and may 
have only as many non-zero entries as the number of neighboring milestones Mj. 
In practice, to solve the eigenvalue problem we can use efficient and accurate 
Krylov subspace solvers [55] such as Arnold! iteration [35] to obtain w without 
computing all the other eigenvectors. 

In Algorithm ^ if := is used instead of the solution w to 

w^A = w^, then the algorithm approximates /r by simple power iteration, = 
^AT”. The reason for defining the weights as the solution to w^A = is 
practical: we have found that it gives faster convergence of the iterations, at 
no apparent cost to accuracy. It can be seen as a version of power iteration 
that uses coarse-graining. See mm for applications of the algorithm in Exact 
Milestoning and [351135| for related discussions. 

Finally, we mention the fact that pseudo-random number generators (PRNGs) 
can only produce a finite amount of pseudo-random numbers. Once the maxi¬ 
mum amount is reached, the generators may silently reuse the previous random 
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numbers in the same order. It has been noted m that this phenomenon leads 
to unphysical artifacts in simulations. The simplest approach to properly use 
PRNGs (and avoid the aforementioned artifacts altogether) consists of reseeding 
the generator from time to time, obtaining the new seeds from high-quality en¬ 
tropy sources such as those available in modern computer hardware (see [T7| [35] 
for more details). 


5 Error analysis 


5.1 Stationary distribution error 


In practice, due to time discretization error, we cannot generate trajectories 
exactly according to the transition kernel K. Instead, we can generate trajec¬ 
tories according to a numerical approximation K^. We investigate here whether 
such schemes are consistent, that is, whether powers of of K converge to a 
distribution /Xj « We emphasize that, even though we account for time dis¬ 
cretization here, we still assume infinite sampling, and thus for a given x € M, 
K^{x,dy) may be a continuous distribution. See Section 5.2 below for related 
remarks and a discussion of how time discretization errors affect the Exact Mile- 
stoning estimate of the MFPT. 

The following theorem, restated from [23] , establishes consistency of iteration 
schemes based on Theorem when is sufhciently close to K and (J„) is 
geometrically ergodic. After the theorem, in Lemmaand Theorem]^ we give 
natural conditions for geometric ergodicity of (J„). 


Theorem 6. Suppose (Jn) is geometrically ergodic: there exists k G (0,1) such 
that 

sup II^AT” - pl\\tv = 0{k^). 

x^M 

Let {ATe} he a family of stochastic kernels with Kq = K, assumed to act contin¬ 
uously on B, such that 


(10) lim sup ||A:,/ - AT/lloo = 0. 

Then for each k G {n, 1), there is 5 > 0 such that for each e G [0, S), has a 
unigue invariant probability measure /ie, and 


sup sup - fJ-ellrv = 0{kf^), 

£<S x^M 

lim |lAi£ - bWtv = 0. 

£->■0 

Geometric ergodicity is inconvenient to check directly. We give two sufficient 
conditions for geometric ergodicity of (J„). The first condition is a uniform 
lower bound on the probability to reach P in steps; see Lemma We use 
this to obtain a strong Feller condition in Theorem [^ The latter is a very 
natural condition and is easy to verify in some cases, for instance when (Xt) is 
a nondegenerate diffusion and the milestones are sufficiently regular. 
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Lemma 7. Suppose that there exists A G (0,1) and N G N such that for all 
X € M, P^(JAr-i G P) > A > 0. Then {Jn) is geometrically ergodic: 

sup||4if”-M||Ty<A-i(l-A)L-/^J. 

xGM 

Proof. Let ^ 1,^2 G V, consider the signed measure C = Ci ~ C 2 and compute 


- f 2 K^\\TV = sup f f ^{dy)K^{y,dz)f{z) 

|/|<i JmJm 

f f i{dy) {K^iy, dz) - Xp{dz)) f{z) 

JM JM 

i(.dy) [ {K^{y, dz) - Xp{dz)) f(z) 

JM 


= sup 
I/I<1 


= sup 
I/I<1 


< (1 — A) sup 
l/l<i 


f M 

i{dy)f{y) 


IM 


= (1 - ^)IICi - 6llry- 


The last line uses the fact that K^{y,dz) — Xp{dz) is a positive measure. This 
shows that is a contraction mapping on V with contraction constant (1 —A). 
Observe also that \\^iK — ^ 2 K\\tv < ll^i — ^ 2 \\tv- The result now follows from 
the contraction mapping theorem. See for instance Theorem 6.40 of [18]. □ 

Note that the A in Lemma Q is a quantity that can be estimated, at least 
in principle, by running trajectories of (Xt) starting at x which cross iV — 1 
milestones before reaching P. However, this is likely impractical for the same 
reason direct estimation of the MFPT is impractical - the trajectories would be 
too long. One alternative would be to compute the probability P^{J^_i G P) 
for the Markov chain (J^) on {1 ,...,to} with transition matrix A, and use 
the minimum over i S {1, •. ■ ,rn} as a proxy for A. Even without a practical 
way to estimate A, we believe the characterization of Lemma [T] is useful for 
understanding the convergence rate. 

Lemma [pleads to the following condition for geometric ergodicity of (J„). 

Theorem 8. Suppose that M is compact and {Jn) is a strong Feller chain which 
is aperiodic in the sense of (§. Then {Jn) is geometrically ergodic. 

Proof. Let e G (0,/r(P)). By Theorem]^ for each x G M there is G N such 
that P^(Jn G P) > e for all n > N^;. Because (J„) is strong Feller, the map 
X —>■ P'^(Jn G P) is continuous. By compactness of M, it follows that for any 
A G (0, e) there is fV G N such that P^(J„ G P) > A for all y G M and n> N—1. 
Theorem now yields the result. □ 


5.2 MFPT error 

As discussed above. Equation can be used to estimate the MFPT [rp] by 
sampling p and local MFPTs The error in this estimate has two sources. 
First, in general we only have an approximation p, = of p. The second source 
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of error is in the sampling of due to the fact that we can only simulate a 
time discrete version {Xnst) of (Xt). In Theorem below we give an explicit 
formula for the numerical error of the MFPT in terms of these two sources. We 
first need the following notation. Let be the minimum of all ndt > 0 such 
that the line segment between X^st and X(n+i)St intersects M \ M^, and define 

■■= f ^idx)E^[fM]- 

J M 

Theorem below gives an expression for the error in the original Milestoning 
as well as in Exact Milestoning. 

Theorem 9. There exists a nonnegative function (j) such that 

lE^rp] - fi{P)-^E^[fM]\ < Cl - TiP)-^\ 

+ ic2\\fl - fiWTV + (l^iSt)) , 


where 

Cl := E>^[tm], C 2 := sup E“[r^]. 

£CGM 

Proof. Note that 

|E^[tp] - fi{P)-^Ef^[fM]\ = HP)-^E^^[tm] - ^{P)-^E^[fM]\ 

< Hpy^E^iTM] - 

+ \fi{P)-^E^[TM] - m(P)-1E^[tm]| 

+ |/i(P)-lE^[TM] - /i(P)-iE^[fM]| 

where we have written E^[tm] := Jm Tidx)E^[Tff]. We may write 

= |E^ [tm] - E^[tm]| 

for the term depending only on time stepping error. Note that 


|E'^[rM]-E''[TM]| = 


/ /i(dx)E"[r^]- / Kdx)E^[Tf,] 
Jm Jm 


< ( sup E"^[r^] 

VkGM 


I!m-. 


ITIC- 


Combining the last three expressions yields the result. 


□ 


Recall that in the above we have ignored errors from finite sampling. We 
now discuss the implications of those errors. In the original Milestoning, fl is 
the canonical Gibbs distribution on the milestones. In that setting, we can 
typically sample independently from jj, on the milestones. Thus, the central 
limit theorem implies that the true error in the Milestoning approximation of 


E^[tp] is bounded above with high probability by the right hand side of (111 
plus a constant times I/^/N, where N is the number of samples. An analogous 
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argument applies to Exact Milestoning if p, is sampled by simple power iteration. 
For our coarse-grained version of power iteration in Algorithm however, we 
obtain samples of p which are not independent, and thus a more detailed analysis 
would be required to determine the additional error from finite sampling. 

We do not analyze the time discretization error 4>{6t) and instead refer the 
reader to m and references therein. Here we simply remark that, if (Xt) is a 
diffusion process, then under certain smoothness assumptions on the drift and 
diffusion coefficients of (Xt) and on M, we have when {XnSt) is 

the standard Euler time discretization with time step 6t. See [26] for details and 
proof. See also [SI in] for numerical schemes that mitigate time discretization 
error in the MFPTs. 


6 Illustrative examples 


In this section we discuss two examples of Milestoning to illustrate the method. 
We consider the solution, (Xt), of the Brownian dynamics equation, 


( 12 ) 


dXt = -XU{Xt) dt -b ^2/3-1 dBt, 

Xo^p 


where [/: H —>■ M is a smooth potential energy function, /3 > 0 is the inverse 
temperature, and (i?*) is a standard Brownian motion. 


6.1 Miiller-Brown potential 

We begin with a system characterized by the Miiller-Brown potential [U] . The 
energy function ?7: H C —>■ K is given by the formula (see also the corre¬ 
sponding energy landscape in Figure]^ 

U{xi,X2) = 

- 170e“T (2 :i-i-5)Vii + 

+ 15era (^1+1)''+! (®i-n)(a;2-i)-K^ (2;2 -i)\ 


This system is a commonly used benchmark for numerical methods for obtaining 
reaction rates. 

We chose to partition H using a Voronoi tessellation (displayed in Figures]^ 
and[^ generated from a set of points gathered by the method of locally updated 
planes m- However, any other set of points could have been chosen (as we shall 
discuss in the next example). Figure]^ also shows our choice of reactant and 
product milestones. 

For the numerical experiments to be detailed below, we solve the stochastic 
differential equation in (121 using the Euler-Maruyama scheme [42] with a time 
step length At = 10“® at a temperature determined by /3“^ = 5. We use the 
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Figure 2: Graph of the Muller-Brown potential energy function. The milestones 
are shown as the overlaid line segments. 


number of force evaluations as a measure of the computational cost of our meth¬ 
ods and we note that the Euler-Maruyama method requires one force evaluation 
per time step. 

We compared two types of experiments that we now describe. The first ex¬ 
periment consists of running Brownian dynamics trajectories started at the re¬ 
actant milestone until they reach the product milestone. As soon as a trajectory 
reaches the product, we initiate a new trajectory from the reactant milestone 
and so on. We refer to these as long trajectories. In the second experiment we 
run Exact Milestoning starting the first iteration with exactly one phase space 
point at each of the milestones along the reaction path. Next, we run ten short 
trajectories per milestone per iteration. These short trajectories start at each 
milestone and stop whenever they reach any neighboring milestone, as described 
in Section m 

Despite allowing the long trajectories to go on for approximately 2.5 x 10® 
force evaluations, only seven reach the product milestone. This leads to a poor 
approximation of the mean first passage time. By contrast, running the Exact 
Milestoning method for approximately 2 x 10® force evaluations, we obtain good 
estimates of the stationary distribution /i and the local mean first passage times. 

The values of are displayed in Eigures|^and[^ The empirical distri¬ 

butions corresponding to ^ on some of the milestones are shown in Figure]^ 

Figures]^ and [^illustrate the non-equilibrium nature of Exact Milestoning. 
The stationary distribution that we compute differs noticeably from the equi¬ 
librium (canonical) distribution. Recall from Figures]^ and [^ that the reactant 
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Figure 3: Milestones represented as line segments. Some of the milestones are 
labeled and the reactant is colored in green while the product is shown in blue. 



Figure 4: Phase space points in the empirical distributions of the stationary flux 
fi for two types of simulations: long trajectories using straight-forward Brownian 
dynamics (left) and Exact Milestoning (right). Despite the fact that the two 
types of simulations involved comparable amounts of computational effort, we 
see that the sampling in Exact Milestoning is much more exhaustive. 


milestone, M 7 , is located at the lower-right minimum while the product mile¬ 
stone, Mq, is at the global minimum in the upper-left side of the graph. With 
this in mind, we see that trajectories initiated at My arrive at the intermediate 
minimum located close to the center of the graph and many of those trajectories 
return to the lower minimum, crossing My again. This results in high values of 
/r(Mg) at the transition state, while the density at Mg (the global minimum) 
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Figure 5: Values of at some of the milestones in the Miiller-Brown po¬ 

tential. The values correspond to the long trajectories (in red) and to Exact 
Milestoning (in blue), as discussed in Section 6.1 Not shown are the mile¬ 


stones other than Mi for i = 0,... 7, where the sampling obtained from the long 
trajectories is insufficient for comparison. 
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Figure 6: Empirical distributions of the stationary flux obtained by long trajec¬ 
tories (in red) and Exact Milestoning (in blue) corresponding to the system in 
Section [O] Notice that the sampling of the long trajectories is very sparse at 
the milestones close to the product. 


is signihcantly lower. Equilibrium considerations, which are inappropriate here, 
would suggest that most of the stationary density (and the stationary probabil¬ 
ity) is concentrated at the global minimum and that the weight at the transition 
state would be small. 

6.2 Rough energy landscape 

In this case, we present an example of Milestoning on the torus 

For our computations, we consider a uniformly spaced mesh of milestones with 

fixed product and reactant sets P,R; see Figure]^ We model a rough energy 
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landscape by a potential energy function of the form: 

N N 

(13) U{xi,X2)=Re E 

ki = -N k2=-N 

where Re denotes the real part of a complex number and iV G N is a constant 
that tunes the ruggedness of the potential. Each coefficient Zk^^k 2 = ^ki,k 2 + 
i&fci.fc 2 G C is determined by the random variables aki,k 2 and which are 

distributed according to 


J c, with probability ^, 
lO, with probability i, 

with c itself being a uniform random variable in the interval (—1,1). Since N 
is fixed, a particular realization of the coefficients specihed above completely 
determines the potential energy function U. The graph of the canonical density 
of a potential energy of the form discussed above is shown in Figure Notice 


* i 

(M 


Xi 


Figure 7: Graph of the density of the canonical distribution corresponding to a 
rough energy landscape with TV = 7 at temperature /3“^ = 1. 


20 


that this class of energy functions generalizes the model for rough landscapes 
introduced by [SS] and that similar potential energy functions have been used 
to model Wigner glasses [1]. 

We carry out Exact Milestoning in this example by solving boundary value 
problems, as described in [5]. The resulting stationary density obtained after 
convergence is shown in Figure 



Xl 


Figure 8: Diagram showing the reactant state (orange square) and the product 


state (blue square) within the set of all milestones for the example in Section 6.2 


Each milestone is an edge of one of the small squares in the diagram. (The total 
number of milestones has been decreased to enhance visibility.) 


It is interesting to note that it has been argued [Ml that an optimal choice 
of milestones would consist of using the level sets (also called isocommittors) 
of the committor function. These surfaces (see Figure [To|) are typically hard to 
compute in practice, which makes the use of Exact Milestoning more appealing, 
as its results are independent of how the milestones are set up. 


7 Conclusions 

The main goal of this manuscript is to present a rigorous mathematical deriva¬ 
tion, based on probability theory, of Exact Milestoning. While the theory of 
Exact Milestoning and accompanying numerical examples were discussed else¬ 
where ISIE!, the mathematical formulation in the earlier paper was not as rig¬ 
orous as in this manuscript. Once this formulation is established, it opens the 
way for further communication between chemical physicists and mathemati¬ 
cians, and it bridges the gap between the communities for further development 
of an important tool for computer simulation. 

Exact Milestoning belongs to a class of enhanced sampling methods for the 
calculation of kinetics. Most closely related approaches to Milestoning are the 
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Figure 9: Stationary density on the rough energy landscape of Section |6.2[ 
The contour lines are the level sets of U. There are 2 x 40 x 40 total milestones 
(shown as the segments in the overlaid grid). 


Non-Equilibrium Umbrella sampling |55j and Trajectory Tilting |53j . The way 
in which trajectories are sampled is similar in all of these methods; however, 
the theoretical frameworks are different. For example, Milestoning allows the 
calculation of all the moments of the first passage time (FPT) distribution [^, 
and hence better estimates of the FPT can be constructed, a result that was 
not reported for other methods. 

It is not necessary in Milestoning to establish or rely on metastability to esti¬ 
mate the average transition time. From this perspective the method is different 
from another exact approach —Transition Path Sampling |16j — that exploits 
the short duration of rare trajectories between metastable states. Exact Mile¬ 
stoning makes the sampled trajectories short by sampling trajectory fragments 
between boundaries of phase space cells or milestones. The statistics of short 
trajectories between milestones make it possible to investigate wide ranges of 
types of energy landscapes, which may be corrugated or not, as illustrated in 
the two examples in this manuscript. We have shown in the present manuscript 
that Exact Milestoning is both accurate and highly efficient. 


22 





Figure 10: Isocommittor surfaces for the rough energy landscape discussed in 
Section 16.21 


It is important to emphasize that the choice of the milestones in Exact Mile- 
stoning is arbitrary from a formal viewpoint. Efficiency considerations suggest 
that it is beneficial to select them following two criteria: (i) the milestones 
should be sufficiently close in the kinetic sense to make the trajectories short, 
and (ii) milestones should be chosen to make the number of iterations as small 
as possible. For example, the number of iterations can be small if the system is 
close to equilibrium and the milestones are expressed in a space of slow variables. 
Then an initial choice of the canonical distribution is quite accurate. 

We comment that the method of Milestoning that was broadly used in the 
past (e.g., [35]) is approximate and assumes local equilibrium within the mile¬ 
stones. While corrections and further refinements were proposed [401 isolES], 
these approximations cannot be made exact and are similar in spirit to the 
local equilibrium and lag time approximations of Markov State Models M- 
Nevertheless, these approximations can be accurate with a proper choice of 
coarse variables. These types of approximations are very useful as the system 
grows in complexity and size and exact calculations become prohibitively ex¬ 
pensive. Milestoning made it possible to investigate kinetics of enzymes m 
and transport through membranes |12j in agreement with experimental obser¬ 
vations. These are systems of tens to hundreds of thousands of particles and 
time scales of milliseconds. It will be of considerable interest to re-evaluate these 
approximations for large systems with the method of Exact Milestoning. As the 
efficiency of Exact Milestoning increases with faster hardware, we are breaking 
scale barriers that were not accessible before to atomically detailed simulations. 
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